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Abstract 

We show that the class of conditional distributions satisfying the 
coarsening at Random (CAR) property for discrete data has a simple 
and robust algorithmic description based on randomized uniform mul- 
ticovers: combinatorial objects generalizing the notion of partition of 
a set. However, the complexity of a given CAR mechanism can be 
large: the maximal "height" of the needed multicovers can be expo- 
nential in the number of points in the sample space. The results stem 
from a geometric interpretation of the set of CAR distributions as a 
convex polytope and a characterization of its extreme points. The 
hierarchy of CAR models defined in this way could be useful in parsi- 
monious statistical modelling of CAR mechanisms, though the results 
also raise doubts in applied work as to the meaningfulness of the CAR 
assumption in its full generality. 
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1 Introduction 



In statistical practice one is often presented with incomplete, or more gener- 
ally, coarse data. To properly model such data, one needs to take into account 
the mechanism by which the data are coarsened. In practice the details of 
this coarsening mechanism are often unknown or computationally expensive 
to model. Therefore, it is of interest to determine conditions under which 
this mechanism can be safely ignored. The "coarsening at random" (CAR) 
ass umption is the weakest co ndition giving t his guarantee. It was identified 
by Heitian and Rubin ( 1991 ). More recently, Griinwald and HalpernI ( 2003 ) 
and IJaeged (l2005bl ) have stressed that the importance of CAR is not re- 
stricted to statistical applications: when updating a probability distribution 
based on new information in learning, artificial intelligence, or other scientific 
applications, it precisely characterizes when one can ignore the distinction 
between the fact that an event has been observed, and the fact that an event 
has happened, thereby considerably simplifying the update process. 

Thus, both in statistical inference with coarsened data and for proba- 
bility updating in learning algorithms, it is attractive to be able to make 
the CAR assumption. In order to be able to judge whether or not the as- 
sumption is warranted, it is important to fully understand its meaning. Here 
we approach this problem by giving two intimately related characterizations 
of the CAR assumption. First, we show that the set of all CAR mech- 
anisms for a given finite sample space can be seen as a convex polytope. 
Each CAR mechanism is a mixture of CAR mechanisms which correspond 
to the vertices of the polytope. Our first main result. Theorem [1], charac- 
terizes these vertices. Our second result, which follows easily from the first, 
complements this geometric view with an algorithmic one. We show that a 
simple probabilistic algorithm can simulat e any pos s ible C AR mechanism, 
and only CAR mechanisms. P r ompted bv iGill et al.l (119971 ) , earlier authors 
(iGriinwald and HalpernI . l2003l : I Jaeger . l2005bh lave also searched for such 
constructions, calling them procedural models for CAR. Yet the procedural 
models proposed so far are not quite satisfactory, because in all cases, 

1. The procedural model depends on parameters which have to be fine- 
tuned in order to guarantee the CAR property; or equivalently, 

2. A small perturbation in the parameters can destroy the CAR property. 



This "frailty" or lack of robus tness is an indi cation that such procedures may 
not occur naturally. In fact (jjaegeii . l2005bl . Theorem 4.17) shows that the 
only CAR mechanisms which a robust procedure can generate must be of a 
special type known as "coarsening completely at random", CCAR. 
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Here we present a natural way to generate all CAR mechanisms, and only 
CAR mechanisms, that does not require fine-tuning of parameters. Our algo- 
rithm works for arbitrary finite sample spaces. It is based on a generalization 
of the notion of a partition of a set which we call a uniform multicover, or 
just multicover for short. 

Superficially, its existence would have to contradict Jaeger's theorem men- 
tioned above. But of course, a proven theorem does not allow any contra- 
dictions. The difference lies in the notion we use of robustness and of its 
negation, frailty. Our result can be seen as criticism of Jaeger's notion of 
robustness, even though this does at first-sight seem appealing and natural. 
By parameterizing CAR distributions in a different manner, we obtain a rep- 
resentation in which CAR can be generated without parameter tuning. In 
a nut-shell: we consider a discrete uniform distribution to be a robust and 
natural object. Jaeger considers it to be an easily perturbed object. 

We emphasize that the body of Jaeger's work remains highly relevant; 
this is just one of a number of important results he has obtained, and we too 
come to the conclusion that CAR mechanisms which are not CCAR will be 
very rare in practice. For instance, our final result. Theorem [H], shows that, 
although no fine-tuning is needed, the complexity (defined in terms of the 
"height" of multicovers) of the CAR mechanisms generated by our algorithm 
can grow exponentially in the size of the sample space. 

The paper is organized as follows. In Section [2] we briefiy introduce coars- 
ening at random and other preliminaries. In Section [3] we give our geometric 
interpretation of CAR distributions (Theorem [1]). In Section H] we define 
uniform multicovers and use these to define our procedural CAR model. We 
show that it generates all and only CAR mechanisms (Theorem [2]). In Sec- 
tion E] we discuss our CAR model in detail. We show (Theorem [H]) that it 
gives rise to an exponential lower bound on the height of the multicovers 
needed in Theorem |2l The proofs are given in the final section. 

2 Preliminaries 

Let ii^ be a finite non-empty set, containing n elements. A coarsening mecha- 
nism is a probabilistic rule which replaces any point x in E with a subset A of 
E containing x. Thus a coarsening mechanism is specified by a collection of 
(conditional) probabilities vr^ such that for all x, J2a9x '^a — 1- Intuitively, x 
is generated by some process which for simplicity we will refer to as 'Nature'. 
But rather than observing x directly, the statistician observes a coarsening 
of X, i.e. a set A containing x. We call x the underlying outcome and A 
the corresponding observation. The coarsening mechanism determines the A 
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that is observed given x; ti\ is the probabihty of observing the set A with 
A3 X, given that Nature has generated x. We define the support of such a 
coarsening mechanism as the set oi A<Z E for which vr^ > for some x & E. 

A coarsening mechanism satisfies the CAR (coarsening at random) prop- 
erty if and only if for all x, x' G A, 



■ka , say. 



Intuitively, this means that the probability of observing A is the same for all 
X that are contained in A: the coarsening is done 'at random', independently 
of the underlyin g x. We note that ([T]) is the de finition o f CAR employed by 



Gill et all f|l997D . It is called "strong CAR" b v I.Taeged fl2nn5aD. T he defini- 



tion is explained in detail by Gill et al. ( 1997 ) and Jaegerl f 2005a ): motiva- 
tion, practical relevance a nd ap plica tions of the CAR proper t y are discussed 
extensively by I Gill et al.l (119971 ) and ICriinwald and HalpernI (120031 ) . 

Definition ([T]) shows that a CAR mechanism is specified by a collection 
of probabilities vr^ indexed by the nonempty subsets A oi E satisfying 



1 Vx G 



(2) 



We can therefore represent a CAR mechanism by the vector tt = {tta '■ C 
ACE), where we assume the subsets A to be ordered in some standard 



manner. For a given finite set of CAR mechanisms tti 



77 p, and any 



probability vector A = (Ai,...,Ap), we define their mixture tt' 
. . . + ApTTp. The following two observations are immediate: 



AiTTi + 



1. For each partition of E, there is a unique CAR mechanism that has 
exactly that partition as its support (for each set A in the partition. 



TT, 



IT A = 1, for all X G A). 



2. Each finite mixture of CAR mechanisms again represents a CAR mech- 
anism. 

These two observations suggest a simple procedural CAR model: Fix some 
integer p > and pick p (arbitrary) partitions Si,...,Sp of E. Each of 
these induces a unique corresponding CAR mechanism. Now fix an arbi- 
trary distribution A = Ai, . . . , Ap on £i, . . . , £^p. The coarsened data are now 
generated by first, independently of the underlying x, selecting one of the p 
partitions according to the distribution A. Then, within the chosen partition, 
the unique A is generated which contains the underlying x. One can think 
of each partition as a 'sensor' with the help of which the data are observed. 
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The procedure amounts to selecting a sensor completely at random, inde- 
pendently of the underlying x generated by Nature. This procedur a l CAR 
model is called the CARgen procedure by iGriinwald and HalpernI (120031 ). 
The 'parameters' of this procedure are the number of partitions p, the par- 
titions £i,...,£p and the distribution A. Clearly, for every setting of the 
parameters, the resulting algorithm defines a CAR mechanism. One may be 
tempted to think that, by an appropriate setting of the parameters, all CAR 
mechanisms can be simulated by CARgen, but the following example shows 
that this is not the case: 



Example 1. (IGill et all 119971 ) Let E = {1,2,3}, A12 = {1,2}, A23 = 
{2,3} and A^i = {3, 1}. Consider the coarsening mechanism tt* defined by 



TT 



*1 



TT 



*2 



TT 



*2 



TT 



*3 



TT 



*3 



TT 



1 

2' 



(3) 



and IT*/ 



for all other x G E,A C E. By ([T]) it is immediately seen 
that this is a CAR mechanism. But because the support of the mechanism 
is not a union of partitions of E, it cannot be simulated by the CARgen 
procedure. 

The example shows that the CARgen procedure is incomplete: there ex- 
ist CAR mechanisms which cannot be represented by any parameter setting 
of CARgen. The question is now whether there exist 'natural' procedu- 
ral CAR models which are complete. In previous wor k, tw o candidates for 
such models were proposed: Griinwald and Halpern's (I20031) CARgen* (an 
extension of CARgen described above) and Jaeger's |2 005b! ) Propose-and- 
Test-model. Both of these suffer from the frailty property mentioned in the 
introduction: rather than producing CAR mechanisms for all parameter set- 
tings, the parameters need to be fine-tuned. In previous work, one other 
procedural model has been proposed which, like CARgen, produce s CAR 
mechan isms for all settings of its parameters. Howeve r, as shown bv iJaeger 
(l2005bl ). this randomized monotone coarsening model (iGill et al.l . 119971 ) is in 
fact equivalent to CARgen: both can simulate exactly the set of 'coa rsening 
completely at random' (CCAR) mechanisms. In fact, (jJaegerl . l2005bl . Theo- 
rem 4.17) shows that any CAR mechanism that is not CCAR is, in a certain 
sen se, nonrobust. For the details of Jaeger's definition of robustness we refer 
to (jjaegei] . [2005bh . Briefly, he supposes that a CAR mechanism involves an 
auxiliary randomization, and defines robustness in terms of robustness to 
changes in the distribution of the auxiliary variable. 

Jaeger's result suggests that there exists no procedural CAR model that is 
both complete and does not require any parameter tuning. Yet in Section IH 
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we exhibit a simple extension of the CARgen procedure which achieves 
exactly this, as long as we are able to sample from a uniform distribution. 
The procedural model will be based on a geometric interpretation of CAR 
which we present below. 



3 A Geometric View of CAR 

We have already indicated that a finite mixture of CAR mechanisms tt is 
itself a CAR mechanism. Hence, for a given finite sample space E the set 
of all CAR mechanisms defined with respect to E forms a convex body in 
Euclidean space. In Theorem [T] we show that this body is a polytope with 
a finite number of extreme points, the vertices of the polytope. In order 
to characterize these extreme points, we first note that the support of a 
CAR mechanism is always a cover of E. With any cover of E we associate 
its incidence matrix: the matrix M with rows indexed hj x & E, columns 
indexed by A in the support, and elements l^^eA}- An incidence matrix of a 
cover is a matrix of O's and I's with at least one 1 in every row and column. 
We now use these incidence matrices to define extreme CAR mechanisms in 
an algebraic way. Theorem [1] below states that these CAR mechanisms are 
also extreme points in the geometric sense, justifying our terminology. 

In the sequel, vectors are always column vectors, even if we lazily list the 
elements in a row. and 1 denote vectors of O's and I's respectively, whose 
length depends on the context. 

Take the incidence matrix M of an arbitrary cover [Ai, . . . ,Am) of E. 
If the equation Mz = 1 has a nonnegative solution, then this solution z = 
[zi, . . . , Zm) represents a CAR mechanism tt, where for any Aj appearing in 
the cover, Zj = tt^. , and for any A not appearing in the cover, ha = (see 



also iGriinwald and Halperru (120031 ). who explain this in detail). We call tt a 



CAR mechanism corresponding to M. 

Definition 1. We call tt an extreme CAR mechanism if it corresponds to an 
incidence matrix M of a cover {Ai, . . . , Am) such that Mz = 1 has a unique, 
and strictly positive, solution. 

By definition, a CAR mechanism is extreme if and only if it is the only 
CAR mechanism with the same support. It is easily checked that the mech- 
anism TT* of Example [1] is an example of an extreme CAR mechanism: it is 
the only CAR mechanism with support ^12,^23,^31. The uniqueness also 
implies that the support of an extreme CAR mechanism cannot have more 
than n elements (the size of E). It is clear that the number of extreme CAR 
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mechanisms, for given E, is finite. We can find them all by enumerating and 
testing all covers of E with m < n elements. 

Theorem 1. Every CAR mechanism is a mixture of extreme CAR mecha- 
nisms. 

In other words, all CAR mechanisms can be represented by randomly 
choosing, independently of x, one of a finite set of extreme CAR mechanisms. 
In the next section, we show that all such extreme mechanisms are of a 
simple and natural form. This will lead to Theorem [2], a direct corollary of 
Theorem [U giving an algorithmic characterization of CAR. 



Our procedure is based on the notion of a uniform multicover, which we 
now define. A k-multicover of E, or just fc-cover for short, is a collection 
of nonempty subsets of E, allowing multiplicities, such that for each x & E, 
precisely k of the sets (some of which may be the same) contain x. Thus a 
1-cover is an ordinary partition of E. By a uniform multicover we mean a 
fc-cover for some A; > 1. The height of a uniform multicover is its value of k. 
The support of a multicover is the set of subsets of E in the multicover. 

A k-cover is specified by its support and by the multiplicity of each set 
in its support. Thus, to each nonempty subset A of E there corresponds a 
nonnegative integer such that tt,^ = if A is absent from the fc-cover, 
otherwise n^i > is the multiplicity of A in the /c-cover. The ua have to 
satisfy 



For a given fc-cover we can now define a CAR mechanism by setting 



The algorithmic interpretation is as follows: Nature generates some x G E. 
The coarsening mechanism investigates which A in the uniform multicover 
contain x. There are exactly k such A, including multiplicities, whatever 
X. We choose one of these uniformly at random, i.e. each A with x & A is 
chosen with probability 1/k. 

Conversely, any CAR mechanism for which all the CAR probabilities vr^ 
are rational numbers is generated by a /c-cover with k equal to the lowest 
common multiple of the denominators of the vr^. We call CAR mechanisms 
obtained in this way rational. The rational CAR mechanisms are precisely 



4 An Algorithmic View of CAR 




(4) 



T^A = n^/k Vy4 C E. 



(5) 
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the CAR mechanisms generated by a uniform multicover. Note that if k and 
all ua share a common factor, we can divide by this factor without changing 
the tta- We consider such multicovers as equivalent and take the multicover 
with the smallest k as representative of the class. In this way, each rational 
CAR mechanism corresponds to exactly one uniform multicover, and vice 
versa. We can make the connection to Theorem [T] by noting that 

Fact 1. Every extreme CAR mechanism is rational. Thus, it is generated by 
a uniform multicover. 

This follows directly from the fact that the matrix M in Definition [1] is a 
0/1-matrix and the solution of Mz = 1 is unique. 

As stated above, for each rational CAR mechanism there is a unique 
uniform multicover which generates it. We can thus define an "extreme mul- 
ticover" as a uniform multicover that generates an extreme CAR mechanism. 
Using Theorem 1, it is easily shown that extreme multicovers are just those 
uniform multicovers that do not contain a subset that is also a uniform mul- 
ticover (we omit the details of the reasoning). 

We may now define a procedural CAR model by first fixing a finite num- 
ber p of arbitrary uniform multicovers Ci, . . . ,Cp. We then fix an arbitrary 
distribution A = (Ai,...,Ap) on Ci,...,Cp. The coarsened data are now 
generated by first, independently of the underlying x, selecting one of the 
p uniform multicovers according to the distribution A. Suppose we have 
chosen multicover Cj with height kj. Then among the kj sets in Cj which 
contain x, we choose one uniformly at random, with probability l/kj. This 
procedural CAR model is a simple extension of CARgen (Section [2]), where 
the role of partitions is taken over by the more general uniform multicovers. 
Like CARgen, it simulates a CAR mechanism for all parameter settings; no 
fine-tuning is needed. Theorem |2] below, part 2 (a corollary of Theorem [T]) 
states that by appropriately setting the parameters, we can simulate all CAR 
mechanisms. Before presenting the theorem, we continue our example. 

Example 2. [Example [T] continued] The collection C = {y4i2, ^23, ^31} is 
a uniform multicover of E with height 2. Consider a simple instantiation of 
the procedural CAR model we described above, with just one multicover C = 
Ci, so that A = (1). For each x chosen by Nature, there will be exactly two 
elements of C which contain x. We select between these with probability 1/2. 
It is immediately clear that this algorithm simulates the CAR mechanism tt* 
described in Example [H An implementation of this mechanism requires a 
fair coin toss. If the coin is biased the CAR property can be lost. Relatedly, 
the mechanism is not robust in Jaeger's sense. 
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Theorem 2. 



1. Every CAR mechanism can he arbitrarily well approximated by a ratio- 
nal CAR mechanism, i.e. for all CAR mechanisms tt, all e > 0, there 
exists a rational CAR mechanism tt' such that ||7r — 7r'|| < e. 

2. Every CAR mechanism is exactly equal to a finite mixture of extreme 
(and hence rational) CAR mechanisms. 

We extensively discuss this theorem in the next section. 



5 Discussion 

Theorem [2] shows that there is an easy probabihstic algorithm which approx- 
imates each CAR mechanism arbitrarily well, and that a randomized version 
of the algorithm reproduces each one exactly. Since the rational numbers 
form a dense subset of the reals, part 1 of Theorem [2] is, in a sense, triv- 
ial. The real innovation is part 2, which shows that each CAR distribution 
can be represented exactly as a mixture of a finite set of candidate rational 
mechanisms. 

No fine tuning of parameters is required to ensure the CAR properties 
so the algorithms do have a robustness property. We just need to be able 
to choose uniformly at random from a finite set. Of course, if one perturbs 
the uniform distribution over the k sets containing a point x, one will in 
general destroy the CAR p roperty - this is the reason that our result does not 
contradict Jaeger 's (j2nn5bh Theorem 4.17. For this reason, some readers may 



not want to call the procedure 'robust'. However, the (weaker) claim that the 
algorithm requires no parameter tuning seems indisputable: we can hardly 
think of implementing a uniform distribution as 'parameter tuning'. Unlike 
the parameters in earlier complete procedural CAR models, which could 
vary from situation to situation and were hard to determine, the uniform 
distribution is universal and easy to determine. If the device we use to 
generate a uniform distribution does not work perfectly, our procedural model 
will slightly violate CAR, hence one might perhaps say it is 'nonrobust'; 
but devices used to generate a uniform distribution (coins, dices) exist, and 
usually do not arise as fine-tuned versions of devices that can generate a 
whole range of distributions; hence one cannot say that our model requires 
'fine tuning'. 

The reason that earlier complete procedural CAR models did require 
parameter tuning, was that their paramete rs had to satisf y complicated con- 
straints (see, for example. Example 4.7 in (iJaegerl . I2nn5bh ). As remarked by 
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M. Jaeger, we do pay a price for avoiding these parameter constraints: we 
now have comphcated constraints (jl]) on muhiphcities of sets appearing in 
multicovers. Such constraints are arguably more natural than constraints on 
continuous- valued parameters, at least as long as the multicovers involved are 
not too complex. Unfortunately, in order to span all CAR mechanisms, we 
sometimes need highly complex multicovers, as we show below. This limits 
the importance of our procedural model, as we discuss further below. 

We can measure the complexity of multicovers in terms of their height. 
Since the row rank of M equals its (full) column rank, m, we can delete rows 
obtaining an m x m nonsingular matrix Mq. Deleting the corresponding rows 
from 1 also, we obtain z = Mq"^1. It follows by the standard expression of 
matrix inverse in terms of determinants that the value of k appearing in ^ 
is bounded by m! . Hence, the height of the extreme multicovers that can be 
defined on a sample space of size \E\ = n is upper bounded by n\. But is this 
too pessimistic? Unfortunately not, or at least, not significantly: our next 
and last theorem gives an exponential lower bound on the maximal height 
of an extreme multicover. It turns out that this grows at least as fast as 
the celebrated Fibonacci numbers, defined as Fi = 1, F2 = 1, and for j > 3, 

Theorem [3] below considers n x n matrices Sn inductively defined as fol- 
lows: 5*1 = (1). For odd n, Sn+i is constructed from Sn by setting 

^"+'"Vo Sn 

For even n, Sn+i is constructed from Sn by setting 



Sn+l 



1^ 

1 Sn 



This is easier than it seems: the pattern should be obvious from the example 
n = 9, shown in Figure [TJ 

Theorem 3. For odd n > 0, the equation SnZ = 1 has the unique solution 

Fn-l Fn-2 F2 Fi 1 

F ' F ■ ■ ■ ' W W 

SO that Sn represents an extreme point for sample spaces with size \E\ = n, 
with height k = Fn- 

The theorem implies that the maximal height of an extreme multicover 
grows exponentially fast with n; also, the maximal needed multiplicity of a 
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Figure 1: The matrix 5*9, an example of the matrices Sn figuring in Theo- 
rem [3l 

set in an extreme multicover grows exponentially fast with n. We interpret 
this result as follows. 

Uniform multicovers are important in two ways: 

1. They lead to an attractive algorithmic characterization of CAR that 
requires no fine-tuning of parameters (Theorem [5]). 

2. They induce a hierarchy of CAR models that could be of use in statis- 
tical applications. We elaborate on this below. 

Yet apart from these applications, the importance of uniform multicovers in 
understanding CAR is limited - the maximal needed height of the multi- 
cover grows exponentially fast with n, so though the idea of the algorithm is 
simple, its detailed specification can be complex. Thus, we can neither say 
that our characterization provides a truly simple description of every CAR 
mechanism, nor that our multicover CAR mechanisms always correspond to 
some 'natural' process. While it seems reasonable to suppose that low-height 
multicovers may be good models for some processes occurring in nature, the 
same cannot be said for exponentially high multicovers, and our Theorem [3] 
does show that we need to take these into account. 



Jaeger's (l2005bl ) robustness Theorem 4.17 suggests that the CAR mech- 
anisms occurring in nature are those generated by randomized 1-covers. Our 
characterization nuances this somewhat, suggesting that in some situations 
/c-covers for small A; > 1 may also be reasonable models. Indeed, the hier- 
archy of CAR mechanisms induced by our algorithm suggests a statistical 
estimation procedure for parsimoniously estimating CAR mechanisms and 
their parameters. Such a procedure would penalize the fit of a proposed 
CAR mechanism to the data. The penalization would be some function of 
the number of extreme multicovers needed to express the mechanism, and 
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the height of each of these. Ahernatively one could use just one multicover, 
not necessarily extreme, and penalize its height. This could be done either 
explicitly, by adding a regularization term to the likelihood, or implicitly, by 
the use of suitable Bayesian priors. 

Such procedures could be useful in practice if one seriously believed that 
the data is CAR but quite possibly, not CCAR. One could hope in this way 
to combine the advantages of asymptotic validity and even go for asymptotic 
efficiency, with good small sample behavior. However, our results can also 
be read in a different way. Though we found an appealing way to model 
CAR, it remains the fact that there do not seem to be so many good reasons 
in practice, in general, to assume CAR but not CCAR. Therefore, if one is 
prepared to assume CAR, one is likely to be also prepared to assume CCAR. 
Though the distinction concerns a "nuisance" part of the model, and indeed, 
in likelihood approaches is invisible by the likelihood factorization implied 
by CAR, one can capitalize on the extra knowledge for instance in order to 
obtain better small sample properties of estimators, at the cost of loss of 
asymptotic efficiency. 

A final view is that the extra generality obtained by relaxing CCAR to 
CAR is illusory. If one does not believe in CAR, one has no opti on but to start 



modelling and estimating the coarsening mechanism. Jaeger (l2006al : l2006bl ) 
has made some proposals in this direction which seem promising. Another 
possibility, so far not explore d, is to us e the notion of relative rather than 



absolute CAR introduced by iGill et al.l (119971 ). The point of CAR is that 



in likelihood inference, one can analyse coarsened data as if the coarsening 
mechanism had been fixed in advance as any particular CAR mechanism, and 
specifically therefore, as if coarsening by an independently fixed-in-advance 
partitioning of the sample space. Relative CAR means CAR relative to some 
other specific (non CAR) coarsening mechanism: the likelihood factors; the 
interesting part is the same as if the data had been coarsened by the reference 
coarsening model; the nuisance part can be used for inference concerning 
which coarsening mechanism has generated the data, out of the mechanisms 
in the family implied by the reference mechanism. It would be interesting to 
explore this possibility in more detail. 

6 Proofs 

6.1 Proof of Theorem [1] 

We show below that the set of all CAR mechanisms forms a convex polytope 
and characterize the extreme points in terms of linear algebra, corresponding 
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to Definition [TJ 

A CAR mechanism is a collection of numbers tt^ indexed by tlie nonempty 
subsets A of a finite set E. Tliey must satisfy two sets of constraints: the 
inequahties ita > for each A, and the equahties ^^bx = 1 for each 
X, both of which are obviously hnear. Together the constraints imply that 
TT^ < 1 for all A. Collecting the vr^ into a vector tt we see that the set of all 
TT is a convex, compact polytope since it is bounded and is the intersection 
of a finite number of closed half-spaces (one for each inequality constraint) 
and hyperplanes (one for each equality constraint). Hence each tt is a convex 
combination of the extreme points of the polytope, of which there are a finite 
number in total. 

The polytope lives in the affine subspace of all vectors tt satisfying the 
equahty constraints '^^^x = 1 for each x. Since tt has 21-^1 — 1 components 
(the number of nonempty subsets of E) and there are \E\ constraints, it 
follows that the dimension of this affine subspace is 21^' — 1 — \E\. The 
polytope is just the intersection of that affine subspace with the positive 
orthant. Within the affine subspace, each face of the polytope corresponds 
to one of the hyperplanes vr^ = 0. Each vertex of the polytope is the unique 
meeting point of a number of faces; one for each A such that vr^ = 0. Thus 
to each vertex is associated a collection of subsets A such that if we set the 
corresponding tca equal to in the equations ^^^^ tta = 1 for all x, there is 
a unique and strictly positive solution in the remaining tt^- Conversely, any 
such collection of A defines a vertex. 

The subsets A not in the collection define the support of the extreme 
CAR mechanism tt under consideration. Let M be its incidence matrix: 
the matrix of zeros and ones with rows indexed by elements x & E, columns 
indexed by A in the support, and with entries I^^^^a}- Write ttq for the vector 
of tta for A in the support. In matrix form, the equations which must have 
a unique and positive solution z = ttq can be written 

Mz = 1, (6) 

and we have proved that there is a one-to-one correspondence between ver- 
tices of the polytope and incidence matrices M of covers of E such that this 
equation has a unique and positive solution. As we argued in Section HI if 
the solution is unique it has to be rational. 

Combining these facts, extreme points of the polytope of CAR mech- 
anisms correspond to covers of E whose incidence matrix M is such that 
Mz = 1 has a unique solution, and the solution is strictly positive. 

Remark A condition equivalent to Mz = 1 having a uniq ue positiy e solu- 



tion (Parkas' lemma in the theory of linear programming, (ISchrijverl . Il986 
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Chapter 7)), is that M has full column rank, and, if y is such that (a) 
y^M > 0, then (b) y^l > 0, with equality in (b) im plying equality i n (a). 
By arguments from integer programming (see ag ain flSchriiveil . llQSfih l one 
may restrict here to vectors y of integers. iJaeger gives a version of 

this condition for the existence of a CAR mechanism with given support - 
he does not demand full rank since he does not ask for uniqueness. Though 
more combinatorial in nature, this version of the condition for extremality 
does not seem to be much more useful, except perhaps for helping one to 
show that certain covers do not lead to solutions. 



6.2 Proof of Theorem H 

Theorem [2] is, in fact, a direct corollary of Theorem [TJ Namely, each extreme 
point is rational and therefore corresponds to a uniform multicover. Every 
point in a polytope can be written as a mixture of its extreme points. This 
gives us item 2. Item 1 follows by considering the rational convex combina- 
tions of the extremes, which lie dense in all convex combinations. 



6.3 Proof of Theorem [3] 

We prove the theorem by induction on n. For n = 1, the result trivially 
holds. Now suppose the result holds for Sn-i-, for some even n > 1. Thus, 
'S'n-iq = 1 has a unique solution 

/ N ( Fn-2 Fn-3 F2 Fi 1 \ 

q=(gi,...,g„_i)= (-^,-^...,-^,-^,-^1. (7) 

\^ n—l ^ n—1 ^ n—1 ^ n—l ^ n—l / 

We prove the theorem by showing that this implies that 

Sn+lV = 1 (8) 

has the unique solution: 

r = (ri,...,r„+i) = ( ^ . . . , ] . (9) 

To prove ©, note first that to each row of ([H]) corresponds a linear equation. 
Writing the equations corresponding to the first two rows explicitly and the 
equations corresponding to rows 3 to n + 1 in matrix form, and reordering 
terms, we see that ([8]) is equivalent to: 

n+l 

(10) 

(11) 

S'„_i(r3, . . . ,r„+i)-' = 1-ri, (12) 







n+l 


r2 = 


1 








i=3 


r2 = 


1 


- ri 


f = 


1 


- ri, 
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where, by our inductive assumption, the last equahty imphes 



(r3,...,r„+i) = (1 -ri)(gi,...,gi). (13) 

and in particular 

71+1 n—l 

J2r, = {l-n)J2q^■ (14) 

1=3 1=1 

Combining f|TT!) with ffTOl) . we get ri = ^"^3^ rj. Plugging this into f|T^ gives 



ri 



1 - ?^1 . 

J=l 



where Qi are given by ([7]). We claim this has the unique solution ri = 
Fn/Fn_^i. To see this, note the following basic fact which follows immediately 
from repeatedly substituting the definition F„ = + Fn-2 on the left in 
(USD: 



Fact 2. For odd n> 0, 

n-2 

F„ = 5^F, + 1. (16) 

i=l 

The fact implies that the right-hand side of f|T5|) is equal to F„/F„_2. 
Plugging in our proposed solution ri = the left-hand side of fn3|) 

becomes F„/(F„+i — F„) = Fn/ so that ffT^ holds. This shows that ri 
is indeed given by By ( fTTj) it now follows that r2 = Fn-i/ Fn+i, 

and, by (fT3l) . that for j G {3,...,n+ 1}, = gj_2/r2 = Sj/Fn+i, where 
(si, S2, . . . , s„_2, s„_i) = (F„_2, Fn-3, . . . , Fi, 1). This shows that is the 
unique solution of Sn+ir = 1, and thus completes the induction step. The 
theorem is proved. 



7 Acknowledgments 

We would like to thank Sasha Gnedin and Lex Schrijver for stimulating 
conversations. Lex Schrijver made some essential contributions to the proof 
of Theorem El 

This work was supported by the 1ST Programme of the European Com- 
munity within the PASCAL Network of Excellence, IST-2002-506778. RDG's 
work was supported (during his previous affiliation) by the Dept. of Mathe- 
matics, Utrecht University, and, thanks to a visiting position, by the Thiele 
Centre, Arhus University. CWI is the Dutch national research institute for 
mathematics and computer science. Eurandom is funded by the Dutch 
science foundation, NWO, and Eindhoven University. 



15 



References 



Gill, R., M. van der Laan, and J. Robins (1997). Coarsening at random: 
Characterisations, conjectures and counter-examples. In D. Lin (Ed.), Pro- 
ceedings First Seattle Conference on Bio statistics, New York, pp. 255-294. 
Springer. 

Griinwald, P. and J. Halpern (2003). Updating probabilities. Journal of 
Artificial Intelligence Research 19, 243-278. 

Heitjan, D. and D. Rubin (1991). Ignorability and coarse data. The Annals 
of Statistics 19, 2244-2253. 

Jaeger, M. (2005a). Ignorability for categorical data. The Annals of Statis- 
tics 33(4:), 1964-1981. 

Jaeger, M. (2005b). Ignorability in statistical and probabilistic inference. 
Journal of Artificial Intelligence Research 24, 889-917. 

Jaeger, M. (2006a). The AI & M procedure for learning from incomplete 
data. In R. Dechter and T. Richardson (Eds.), Proceedings of the Twenty- 
Second Conference on Uncertainty in Artificial Intelligence (UAI 2006), 
pp. 225-232. 

Jaeger, M. (2006b). On testing the missing at random assumption. In 

J. Fiirnkranz, T. Scheffer, and M. Spiliopoulou (Eds.), Machine Learn- 
ing: ECML 2007, Seventeenth European Conference on Machine learning. 
Volume 4212 of Lecture Notes in Computer Science, Berlin. Springer. 

Schrijver, A. (1986). Theory of Linear and Integer Programming. Chichester: 
John Wiley and Sons. 



16 



